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Abstract 

Background: Klebsiella pneumoniae strains are pathogenic to animals and humans, in which they are both a frequent 
cause of nosocomial infections and a re-emerging cause of severe community-acquired infections. K. pneumoniae 
isolates of the capsular serotype K2 are among the most virulent. In order to identify novel putative virulence factors 
that may account for the severity of K2 infections, the genome sequence of the K2 reference strain Kp52.145 was 
determined and compared to two K1 and K2 strains of low virulence and to the reference strains MGH 78578 and 
NTUH-K2044. 

Results: In addition to diverse functions related to host colonization and virulence encoded in genomic regions 
common to the four strains, four genomic islands specific for Kp52.145 were identified. These regions encoded genes 
for the synthesis of colibactin toxin, a putative cytotoxin outer membrane protein, secretion systems, nucleases and 
eukaryotic-like proteins. In addition, an insertion within a type VI secretion system locus included sell domain 
containing proteins and a phospholipase D family protein (PLD1). The pldl mutant was avirulent in a pneumonia 
model in mouse. The pldl mRNA was expressed in vivo and the pldl gene was associated with K. pneumoniae isolates 
from severe infections. Analysis of lipid composition of a defective E. coli strain complemented with pldl suggests an 
involvement of PLD1 in cardiolipin metabolism. 

Conclusions: Determination of the complete genome of the K2 reference strain identified several genomic islands 
comprising putative elements of pathogenicity. The role of PLD1 in pathogenesis was demonstrated for the first time 
and suggests that lipid metabolism is a novel virulence mechanism of K. pneumoniae. 
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Background 

Klebsiella pneumoniae is responsible for a variety of dis- 
eases in humans and animals. As a prominent nosocomial 
pathogen it is mainly responsible for urinary tract, respira- 
tory tract or blood infections [1-3]. In addition, because of 
the acquisition of extended-spectrum (3-lactamases and car- 
bapenemases, such as the recently described NDM-1 [4], 
multi, extremely or pan-drug resistant clinical strains are 
more frequently isolated [5,6]. In addition, K. pneumoniae 
has re-emerged as a cause of community-acquired in- 
fections including pneumonia and the characteristic 
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syndrome of pyogenic liver abscess, with possible compli- 
cations including endophthalmitis or meningitis [7,8]. 
K. pneumoniae is, thus, an important virulent pathogen 
able to cause serious infections in ambulatory, otherwise 
healthy hosts and to spread within patients [5,9] that re- 
quires a better understanding of the molecular mecha- 
nisms underlying the various forms of its pathogenesis. 

Major K. pneumoniae virulence factors include the 
capsule, the lipopolysaccharide, iron scavenging systems 
and adhesins [3,10-14], The capsule is one of the most 
important virulence determinants, protecting against 
serum bactericidal activity, antimicrobial peptides and 
phagocytosis [11,15-18]. At least 77 capsular (K) types 
can be distinguished, but types Kl and K2 are prominent 
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by their virulence in murine models of infection [19-21] 
and by their epidemiological prevalence [9,18,21,22]. How- 
ever, not all Kl/K2-type strains are necessarily virulent, as 
distinct clonal groups of Kl and K2 differ sharply by their 
virulence [23]. Reference strain Kp52.145 (derived from 
B5055, the reference strain of serotype K2) is a highly viru- 
lent strain from which important virulence factors, includ- 
ing the large virulence plasmid harboring the regulator 
of mucoid phenotype (rmpA) and the aerobactin clus- 
ter, were identified [10,11,21]. Even though a virulence 
plasmid-cured strain is less virulent than the parental 
strain, it remained more virulent than other isolates that 
do not harbor this plasmid [21], showing that factors other 
than capsule overexpression and aerobactin account for 
the higher pathogenesis of this strain. Therefore, although 
known virulence factors are certainly crucial for bacterial 
survival, protection and interaction with the host, putative 
new virulence factors that could subvert host cell physi- 
ology and response remain yet to be identified. 

Comparative genomics of pathogenic and non-pathogenic 
strains is a powerful approach to identify putative viru- 
lence genes. Several draft or complete genomes of clinical 
isolates of K. pneumoniae have been published so far, but 
only the virulent serotype Kl strain NTUH-K2044 [24] 
has been described in detail. In order to identify new K. 
pneumoniae K2 virulence factors, we sequenced the 
genome of the virulent strain Kp52.145, as well as two 
additional strains of low virulence, SB2390 (serotype K2) 
and SB3193 (serotype Kl). By comparing these three novel 
genomes to the publicly available genomes of the viru- 
lent Kl strain NTUH-K2044 and reference strain MGH 
78578, we identified in Kp52.145 putative virulence genes 
and analyzed their distribution within a diverse collec- 
tion of K. pneumoniae strains. We demonstrate that a 
gene coding for a phospholipase D family protein (PLD1), 
located within a type VI secretion system locus, is 
expressed in vivo, is involved in controlling bacterial mem- 
brane lipid composition, and is a new virulence factor. 

Results and Discussion 

Genome assembly and annotation 

The genomes of Kp52.145, SB2390 and SB3193 were 
sequenced by a combination of 454 and Illumina tech- 
nologies using single and paired-end libraries. Finishing 
efforts resulted in the complete genome sequence of 
K. pneumoniae Kp52.145 (one chromosome + two plas- 
mids), comprising 5.45 Mbp and 5,314 protein coding genes 
(Figure 1). SB2390 and SB3193 genomes were assembled in 
11 and 17 scaffolds, respectively. The GC% of these three ge- 
nomes ranged from 55.6% to 56.7%. The general features 
of the K. pneumoniae sequenced genomes are summa- 
rized in Table 1. 

Because Kp52.145 is a highly virulent strain, our analyses 
were focused on comparing its genome with the genomes 



of K pneumoniae strains SB2390, SB3193, NTUH-K2044 
and MGH78578. According to SEED subsystems annota- 
tions [25], about 60% of protein-coding genes for each 
K pneumoniae genome had predicted functions. More 
specifically, the largest percentage of annotated genes is in- 
volved in the metabolism of carbohydrates (approximately 
20%), of amino acids and their derivatives (approximately 
10%) and of cofactors, vitamins and prosthetic groups 
(approximately 8%) [see Additional file 1]. 

Common genome 

To define the common genome of the five strains, we 
used stringent BlastClust parameters of at least 90% 
identity and at least 80% coverage. This analysis identified 
3,587 coding sequences common to the five genomes. The 
majority of proteins are involved in metabolic processes, 
such as energy metabolism and transporters, supporting 
the general concept that the core genome encompasses 
essential functions required for survival of the micro- 
organism. The K. pneumoniae core genome comprised 
several sets of genes whose functions are related to bacter- 
ial survival in the environment or interaction with its host, 
and possibly virulence. This was the case, for example, of 
genes involved in quorum-sensing and biofilm formation, 
adhesins, and secretion systems, for which examples are 
detailed below. 

Genes encoding for autoinducer-2 and type III fimbria, 
involved in biofilm formation in K. pneumoniae [26-28], 
were present in all sequenced strains. In addition, genes 
coding for synthesis and transport of the poly-(3-l,6-N- 
acetyl-D-glucosamine (PGA) adhesin (KpST66_4915 to 
KpST66_4918 in Kp52.145 genome) which is required for 
the structural stability of Escherichia coli biofilms [29], 
and YidE (KpST66_0019), which mediates the hyperad- 
herence phenotype of E. coli [30], were also found in the 
core genome of K. pneumoniae strains. 

Moreover, the five K. pneumoniae genomes con- 
tained the genes barKluvrX (respectively KPST66_0986 
and KpST66_3517) and ycjX/ycjF (KpST66_2441 and 
KpST66_2442) that may be involved in bacterial fitness 
and virulence. The two-component system BarA/UvrY 
(KpST66_3517 and KpST66_0986 in Kp52.145 genome) 
contributes to biofilm formation in Salmonella enterica 
and is a virulence determinant of urinary tract E. coli in- 
fections [31,32]. The E. coli YcjF protein is expressed in a 
septicemia murine model of infection in which the ycfF 
mutant is attenuated, thus suggesting its implication in 
the in vivo survival/multiplication of the bacteria [33] . 

Several putative secretion systems were identified as 
part of the common genome of the K. pneumoniae strains, 
including one type I secretion system (T1SS) and one 
type II secretion system (T2SS). T2SS is composed of 
the pullulanase related genes pulA-O that are involved in 
the pathogenesis of several bacteria [34,35]. Streptococcus 
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Figure 1 K. pneumoniae Kp52.145 genome. A) Chromosome representation: Outermost layers in gray indicate the position of positive and 
negative strand CDSs. tRNAs are represented in green, while the four virulence-related genomic islands (Gl) are in red, the locus coding for 
anaerobic metabolism in orange, the T6SS gene clusters in blue and capsular synthesis region in yellow. Inner circle represents G + C%. 
Detail of the T6SS locus III region containing the putative phospholipase [pldl gene) (orange) and se/1 (pink) genes is shown enlarged. 
B) Plasmids representation: plasmid maintenance genes and IS sequences are shown in blue, proteins with unknown functions in gray, 
known functions in green, toxin-antitoxin systems in yellow and rmpA in red. CDSs, coding sequences. 
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Table 1 General features of the K. pneumoniae genomes analyzed 



K-type 


Strain bank ID 


Scaffolds 


Size (Mb) 


%GC 


Total PCG 


rRNA operons 


tRNA genes 


% coding 


Plasmids 


Reference 


K2 


3341/Kp52.145 


3 


5.45 


56.4 


5,314 


8 


85 


88 


2 


This manuscript 


K2 


2390 


11 


5.62 


56.74 


5,367 


6 


77 


86.8 


? 


This manuscript 


K1 


3193 


16 


5.01 


55.6 


4,793 


10 


73 


92 


? 


This manuscript 


K1 


NTUH (ST23) 


2 


5.24 


57.6 


4,992 


19 


76 


89 


1 


Wu et al. [63] 


K5 


MGH (SB107) 


6 


5.31 


57 


4,776 


10 


56 


85 


5 


(NC_009648) 



The three genomes sequenced in this study are compared to strains NTUH and MGH, previously published. PCG, protein coding genes. 



pyogenes PulA binds to host lung glycogen leading to a 
strong interaction with alveolar type II cells [36]. Similarly, 
the alpha-amylase AmyA degrades glycogen into cyclic 
maltodextrins, which increases the transepithelial trans- 
location of Streptococcus [37]. Both amy A and pulA-0 
genes are encoded in K. pneumoniae genomes, but their 
functions in this bacterium remain to be characterized. 
Type VI secretion system (T6SS) putative genes were lo- 
cated in at least three different loci of the K. pneumoniae 
genomes, in accordance with a previous in-silico study 
[38]. T6SS clusters are usually found within pathogenicity 
islands or on chromosomal regions presenting virulence 
or host survival biases. Additionally, T6SS has been sug- 
gested to assist colonization and infection. Indeed, in a 
screen that identified K. pneumoniae mutants failing to 
colonize mice [39], two of them were mutants in genes 
coding for proteins annotated as hypothetical, that have 
been subsequently re-annotated as T6SS proteins [38]. 
Type III secretion systems were not found in Klebsiella, 
but type IV secretion systems, possibly corresponding to 
conjugation apparatus, were present only in some strains 
(Kp52.145, SB2390 and NTUH-K2044) (Table 2). 
Furthermore, the core genome presented a large gen- 



omic region located between gly-tRNA and phe-tRNA loci 
Table 2 Distribution of virulence-related factors among K. pneumoniae genomes analyzed 



in Kp52.145 genome, containing frdABCD genes coding 
for the fumarate reductase enzymatic complex respon- 
sible for fumarate respiration under anaerobic growth 
of bacteria. This complex is a virulence determinant for 
Helicobacter pylori, Mycobacterium tuberculosis, Actino- 
bacillus pleuropneumoniae and S. enterica, as mutants on 
these genes are attenuated [40-43]. The ability to grow an- 
aerobically allows bacterial pathogens to persist in host 
tissues, including in the lungs. Curiously, this genomic 
locus encoded in K. pneumoniae at least one more 
protein involved in anaerobic metabolism, the anaer- 
obic C4-dicarboxylate transporter DcuA (KpST66_4904), 
supporting the idea that this GI provides K. pneumoniae 
advantages to grow under anaerobic conditions, possibly 
favoring infection. Additional proteins that might be 
involved in bacterial fitness to environmental stress 
conditions were encoded in this region. For instance, 
the putative small multidrug resistance protein SugE 
(KpST66_4882) has been shown to regulate biofilm for- 
mation and capsule expression [44]. A lipocalin-2 bacterial 
protein Bel (KpST66_4881) is also encoded in this island 
of the genome. Eukaryotic lipocalins are small extracel- 
lular proteins that bind hydrophobic ligands and fulfill 
numerous biological functions including regulation of 



Virulence-factor 



SB3341 



SB2390 



SB3193 



NTUH 



MGH 



Functional role 



rmpA 

Aerobactin 

Enterobactin 

Yersiniabactin 

Colibactin 

T4SS (virB) 

T2SS 

T6SS 

Pld-family 

Sell lipoproteins 

cOMP 

Igg-like 

SEFIR-domain 

Bel 



++ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
+ 



++ 
+ 
+ 
+ 

+ 
+ 
+ 



Regulator of capsule expression 
Siderophore 
+ Siderophore 
Siderophore 
Genotoxin 

Conjugative machinery/protein secretion 
+ Protein secretion 

+ Protein secretion 

Lipid metabolism 

Unknown 

Putative cytotoxin 

Binding to extra cellular matrix compounds 

Potentially hijack IL17R signaling pathways 

+ Binding to hydrophobic ligands / putative 

regulation of homeostasis and immunity 
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cellular homeostasis and immunity and are regulators 
of antibacterial defense [45]. Lipocalin2 is for instance 
a siderophore scavenger for several bacteria, including 
K. pneumoniae [46], as well as a negative regulator of 
inflammatory response during Streptococcus pneumoniae 
pneumonia [47]. However, the role of bacterial lipocalins 
is not yet known. 

Finally, a phosphatidylserine decarboxylase (KpST66_4873) 
might be important for the integrity of the bacterial 
membrane composition, as phospholipid biosynthetic 
pathways play crucial roles in the virulence of several 
pathogens [48,49]. 

Accessory genome 

The accessory genome (genes absent in at least one of 
the five strains) included a large number of specific cod- 
ing sequences (CDSs): 743 genes were found only in 
Kp52.145, 608 in NTUH-K2044, 806 in SB3193, 635 in 
SB2390 and 488 in MGH78578. About 50% of these 
genes code for hypothetical proteins or proteins of un- 
known function. The distribution of the putative virulence- 
related genes of K. pneumoniae among the sequenced 
strains is summarized in Table 2. The specific regions or 
genes of Kp52.145 are detailed below. 

Kp52.145 plasmids 

Strain Kp52.145 possessed two plasmids (Figure IB). The 
first plasmid of 121 Kb carried the aerobactin cluster and 
the regulator of mucoid phenotype rmpA genes. The pres- 
ence of this plasmid was previously correlated with the 
virulence of K. pneumoniae Kl and K2 isolates [21,50]. 
Additionally, a strain cured from this plasmid showed a 
6 x 10 4 -fold reduction in virulence, establishing the 
link between this plasmid and bacterial virulence [21]. 
In addition to aerobactin and rmpA, this plasmid con- 
tained genes coding for F-pilin, purine metabolism, 
insertion sequences and proteins of unknown functions. 
Kp52.145 also carried a second, previously non-described, 
plasmid of about 90 Kb. Interestingly, it contained rmpA2, 
a homologue of the regulator of mucoid phenotype 
rmpA, which seems to be involved in capsule expression 
regulation [51,52]. F-pilin genes, a subtilisin-related serine 
protease, an AAA + ATPase, the UV protection system 
UmuD/UmuC and the genes encoding the toxin-antitoxin 
systems RelE/orf4 and VagD/VagC (Figure IB) were also 
found on this plasmid. However, the potential role of these 
genes in virulence remains to be investigated. 

Genomic islands identified in the genome of strain 
Kp52.145 

In addition to the capsule synthesis operon and the iron- 
acquisition systems (yellow and orange boxes in Figure 1 A, 
respectively), known to be involved in K. pneumoniae 
virulence, four additional regions of the Kp52.145 genome 



presented several characteristics of pathogenicity islands 
(red boxes, Figure 1A). These regions are defined by a 
different GC content in comparison to the average of 
the genome, represented large chromosomal regions 
(often > 30 Kb), were associated with tRNA genes or 
with the presence of insertion sequences, integrases and 
transposases, and were present in pathogenic strains while 
less frequent in less-virulent strains [53,54]. The four 
GIs present in the Kp52.145 genome were present or 
partially present in NTUH, but none of them was present 
in SB2390, SB3193 and MGH 78578, indicating that their 
occurrence is specific to pathogenic genomes. Figure 2 
shows the four GIs identified, highlighting the puta- 
tive virulence related genes. 

Genomic island 1 (GI-1): ICE-Kpl-like region 

The largest GI found in the Kp52.145 genome comprised 
133,679 bp, presented a GC content of 52%, was inserted 
in an asn-tRNA locus and encoded 92 CDSs (Figure 2A). 
Most of the protein-coding genes found in this region 
were described as part of the IceKpI GI of NTUH-K2044 
[55,56], although several differences were found. Kp52.145 
GI-I begins at asn tRNA locus followed by several unchar- 
acterized proteins and insertion sequence elements. GI-I 
coded for the synthesis of two polyketide/nonribosomal 
peptides (yersiniabactin and colibactin) and for the conju- 
gative transfer machinery (T4SS) that allows horizontal 
transfer of the island [56]. In contrast to the previous 
description of IceKpI [56], the Kp52.145_GI-I carried 
colibactin and did not contain the region coding for vagC- 
vagD, iroN-iroB-iroC-iroD and rmpA genes which are 
carried only by the 121 kB plasmid in Kp52.145. 

Genomic island 2 (Gl-ll) 

We describe here a novel GI (GI-II). It is a 29,829 bp is- 
land, with a GC content of 49% and coding for 28 CDSs 
which is inserted in a leu-tRNA locus (Figure 2B). Poten- 
tial pathogenesis-related genes coded for a putative cyto- 
toxic outer membrane protein (cOMP, KpST66_4736) 
and a polyamine ABC transport system (KpST66_4729 to 
KpST66_4732). cOMP closest known homologue (34% 
identity) was a Plesiomonas shigelloides predominant viru- 
lence factor proposed to trigger cell death in host cells 
following infection [57]. Polyamine biosynthesis and 
transport mechanisms were intricately linked to fitness, 
survival, biofilm formation and pathogenesis, for instance 
in S. pneumoniae and Yersinia pestis [58,59]. Additionally, 
this GI encoded a 4-phytase gene (KpST66_4736), ugpQ3 
(KpST66_4728) and xylA, xynT, xynB, xylR (KpST66_4724 
to KpST66_4727) that are involved in xylose metabolism. 

Genomic island 3 (Gl-lll) 

The third GI is characterized by a 49,657 bp region pre- 
senting a G + C content of 51% and contained 66 CDSs, 
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Figure 2 (See legend on next page.) 
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(See figure on previous page.) 

Figure 2 Genomic islands (GIs) identified in the genome of Kp52.145. Positive and negative strand CDSs are represented in gray. Putative 
virulence related genes are highlighted in different colors. The size of each Gl is given in kb. Panel A: Gl-I, B: Gl-ll, C: Gl-lll and D: GI-IV. CDS, 
coding sequences. 



most of them coding for phage structural proteins 
(Figure 2C). This GI included genes coding for pro- 
teins with homologues that were shown to play a role 
in bacterial adhesion and immune system escape [60-62] . 
These proteins encoded for an immunoglobulin domain- 
containing protein (KpST66_1506), a peptidase S24-like 
protein (KpST66_1511), two HNH family endonucleases 
(KpST66_1468 and 1486) and an exonuclease (locus 
1464). 

Genomic island 4 (GI-IV) 

The forth island (GI-IV) was mainly comprised of phage- 
related genes. Among the 42 CDSs encoded within this 
genomic region, one gene coded for a SEFIR-domain 
containing protein (KpST66_1945; Figure 2D). A SEFIR 
domain is usually found in IL17 receptors and SEF pro- 
teins, acting in eukaryotes signaling pathways. Very little 
is known about prokaryotic SEFIR-containing proteins. 
Structural analyses suggested that these bacterial SEFIR 
domains share structural and electrostatic similarity with 
their mammalian homologues and, thereby, could po- 
tentially subvert host immunity by hijacking the IL17R 
signaling pathways [63]. Notably, local production of 
IL-17 is a significant factor in effective host defense 
against Gram-negative bacteria, including K. pneumoniae 
[64]. Therefore, further studies are required to elucidate 
whether KpST66_1945 is implicated in K. pneumoniae 
pathogenesis. 

Distribution of GIs among K. pneumoniae strains 

To investigate the distribution of the described GIs in 
K. pneumoniae, the presence of the putative virulence- 
related genes was searched using BlastN in 171 genomes, 



including 119 publicly available and 52 unpublished ge- 
nomes (Bialek-Davenet, Brisse et al, unpublished work) 
representing 47 different sequence types (STs). Whereas 
the SEFIR-domain containing protein gene of GI-IV was 
only found in two (1.2%) isolates of sequence type ST15, 
the three other GIs described herein were more distrib- 
uted among K. pneumoniae isolates (Table 3). GI-II genes 
were present in a total of 11 (6.4%) isolates, most of which 
belonged to ST375, ST65 and ST25, which were associ- 
ated with severe infections caused by isolates of capsular 
serotype K2 [9,65]. The genes of GI-II were always found 
in synteny. GI-III genes were observed in only seven 
(4.1%) isolates dispersed in several unrelated STs. The dis- 
tribution of the ICEKpl element (similar to GI-I) has been 
previously analyzed [55]. 

T6SS locus III insertion 

Recently, three different T6SS loci were defined in K. 
pneumoniae [38]. Within these loci, three putative valine- 
glycine repeat (Vgr) proteins and two hemolysin-coregulated 
proteins (Hep) were described as potential effector proteins, 
through their sequence similarities to Vibrio cholerae and 
Pseudomonas aeruginosa effector proteins [65-68]. Ac- 
cordingly, the Kp52.145 genome also presented three con- 
served T6SS loci syntenic to those previously described. 
The first two loci were identical in composition and orien- 
tation to the previously described ones. The third one, 
locus III, had conservation of adjacency limited to the 
imcF/impG/impH and impJ/ompA/vgrG gene clusters, as 
a region with nine genes was inserted between these two 
clusters. This insertion encoded for one hypothetical 
protein, five putative Sel-1 repeat containing lipopro- 
teins and three putative phospholipase D family proteins 



Table 3 Prevalence of the genomic islands among K. pneumoniae isolates 


Genomic island 


Virulence-related features 


Prevalence (%) 


Remarkable STs 


GI-I 


Colibaction 


3.5 a 






Yersiniabactin 






GI-II 


cOMP 


5.8 


3775,65,25 




4-phytase 






Gl-lll 


Igg-like 


8.1 


Dispersed 




Exonuclease 


57.6 


375,65 


GI-IV 


Sefir-domain 


1.7 




T6SS insertion 


PLD1 


7.0 


380,35 




Sel 1 


7.0 


380,35 



According to ref. [55]. Prevalence of virulence-related features encoded in GI-II, Gl-lll, GI-IV and T6SS insertion was based on >90% identity and >50% coverage in 
the length of the genes, using a database of 171 Klebsiella genomes representative of 47 different STs. STs, sequence types. 



Lery et al. BMC Biology 2014, 12:41 
http://www.biomedcentral.eom/1 741 -7007/1 2/41 



Page 8 of 15 



(Figure 1A). Flanking sequences suggesting how this region 
was inserted were not found. 

Sell lipoproteins are poorly characterized and there 
was no evidence for their function in K. pneumoniae, 
but they are essential in Legionella pneumophila for inva- 
sion of host cells where they influence vacuolar trafficking 
of bacteria [69]. The three open reading frames encod- 
ing putative phospholipase D proteins in strain Kp52.145 
encoded one full length protein (KpST66_3368, 623 ami- 
noacids), one C-terminal region (KpST66_3371, 187 aa) and 
one N-terminal region (KpST66_3372, 317 aa). Phospholip- 
ase D family proteins have been described as important 
for host cell invasion, bacterial dissemination and dis- 
ease progression [70-73]. The bacterial phospholipase 
D family comprises at least four classes of proteins with 
distinct functions: true phospholipase D, cardiolipin syn- 
thase, phosphatidylserine synthase and endonuclease [74]. 
Full length K. pneumoniae PLD1 and its closest homologs 
all presented the conserved motif HXK(X) 4 D and a serine 
or threonine approximately eight residues after asparagine 
(Figure 3), but no other conserved domain was described 
in each family, thus making it difficult to infer protein 
function only by sequence analysis. 

In order to obtain evidence that the product of 
these three genes coding either for full length or partial 
PLD-family proteins are important for bacterial survival 
in vivo, we checked by RT-PCR for their mRNA expres- 
sion. We observed that these genes are expressed 
both in bacteria grown for four hours in Trypto Casein 
Soy broth (GTCS) medium, as well as in the lungs of 
mice infected for 24 hours (data not shown). These 
results prompted us to further check for a putative 
involvement of the full length PLD gene, called pldl, on 
K. pneumoniae virulence. 

Involvement of the phospholipase D family protein gene 
pldl in KP 52.145 virulence 

As PLD-family proteins have been shown to be involved in 
virulence [75-77], we decided to characterize the role of 
the full length PLD1 protein. We first tested a pld mutant 
strain in a K. pneumoniae murine model. Mice were in- 
fected intranasally with 10 8 of either the wild-type bacteria, 
a pldl mutant, or the mutant strain complemented with a 
plasmid expressing the putative PLD1 protein (pPLD), and 
their survival was monitored for seven days. Interestingly, 
the mutant strain appeared avirulent in a mouse model of 
acute pneumonia while mice infected with the wild-type 
and the complemented strain succumbed in less than one 
week (Figure 4). However, the wild-type, the mutant and 
the complemented mutant strains grew equally well in 
Luria-Bertani broth (LB) broth (data not shown). These 
results indicated that the pldl mutant was strongly attenu- 
ated in vivo, thus showing an important role for PLD1 in 
virulence. 



To analyze the frequency and clonal distribution of 
the pldl gene in K. pneumoniae, the 171 genomes were 
analyzed using BlastN. We observed that besides ST66, 
represented by strain Kp52.145, pldl was present in 
10 strains (6.4%) belonging to ST380, ST679, ST67 
(K. pneumoniae subsp. rhinoscleromatis) and ST35, but 
in none of the other isolates. It is interesting to note that 
ST380 was associated with severe K. pneumoniae infections 
[9,65] and that K. pneumoniae subsp. rhinoscleromatis is 
the only Klebsiella strain to be able to survive intracellularly 
in macrophages [78]. 

Functional characterization of PLD1 

In order to demonstrate the phospholipase activity of 
PLD1 and characterize its involvement in lipid metabol- 
ism, the lipid composition of wild-type and mutant strains 
was analyzed by thin-layer chromatography (TLC). A re- 
markable lipid spot was absent from the pldl mutant in 
comparison with the complemented strain, suggesting 
that the putative PLD1 is involved in lipid metabolism 
[see Additional file 1]. To reinforce this result, a plasmid 
carrying the pldl gene was inserted into E. coli strain SD9 
[79]. This strain is deficient in phosphatidylserine and car- 
diolipin, thus presenting a simpler lipid composition than 
its parental strain and Kp52.145. Lipid profiles of SD9 and 
complemented strains had a different lipid compos- 
ition. Notably, the PLD 1 -expressing strain contained an 
additional lipid spot in comparison to the SD9 strain, 
suggesting that PLD1 is responsible for this difference 
(Figure 5A). SD9 wild-type strain also presented an extra 
lipid spot in comparison to the PLDl-expressing strain, 
possibly representing the PLD1 substrate (Figure 5A). 
Densitometric analysis of iodine-stained lipids on TLC 
plates revealed that this lipid spot corresponded to 21% of 
the total amount of lipids in SD9 strain, but only 6% of 
SD9 complemented with a plasmid expressing pldl. Mass 
spectrometry (MS) analysis of total lipid extract was car- 
ried out to identify such lipid. Comparing lipid profiles by 
MS, we found a lipid of mass 788.4 present only in the 
PLD 1 -deficient strain (Figure 5B) and identified it as phos- 
phatidyl glycerol (PG) using the LipidMaps database. 

As mentioned above, the bacterial PLD-family proteins 
can be classified in four subfamilies. One of them, the 
cardiolipin synthase is able to convert two PG molecules 
into glycerol and cardiolipin, or to catalyze the opposite 
reaction, leading to PG formation [80]. Our results 
suggest that PLD1 belongs to the cardiolipin synthase 
subfamily and that it plays a role in balancing the PG 
and cardiolipin content. 

It has been shown that humans and mice with bacterial 
pneumonia have markedly elevated amounts of cardiolipin 
in lung fluid and that it impairs surfactant function, lung 
mechanics, modulation of cell survival and cytokine net- 
works and lung consolidation [81]. 
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Figure 3 Sequence alignment of Kp52.145 (Kp) PLD1 and its closest related sequences: putative phospholipase D family protein from 
Pseudomonas syringae (PS) and cardiolipin synthases from P. putida (PP), Staphylococcus aureus (SA), Bacilus subtilis (BS) and E coli (EC). 

The rectangles indicate the phospholipase D active site regions and the asterisk points to the corresponding transposon insertion site in pldl 
mutant strain. 
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Figure 4 PLD1 involvement in K. pneumoniae virulence. Mice survival after infection with K. pneumoniae Kp52.145 wild-type, pldl mutant and 
complemented strains. Data are representative of seven mice per group from two independent experiments. Standard deviation is shown. PLD, 
phospholipase D. 
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There is evidence that bacteria are able to adjust their 
relative concentrations of phosphatidylethanolamine and 
PGs when subjected to environmental stresses. Such an 
alteration in headgroup composition seems to be a 
means for changing membrane permeability and, hence, 
preserving stability [82]. Therefore, we hypothesize that 
PLD1 alters the membrane composition and charge, af- 
fecting bacterial interaction with the host environment. 

Recently, Russel el. al. demonstrated that diverse phospho- 
lipase proteins encoded within the T6SS loci of several 
prokaryotic genomes are antibacterial effectors, conferring 
competitive advantages on the donor strain during inter- 
bacterial interactions [83]. These proteins are generally 
designated as 'T6SS Lipase Effectors' (Tie) and classified 
in five sub-families, according to the sequence conserva- 
tion and number of catalytic motifs present. Kp52.145 
PLD-family protein could be considered a Tle5 member, 
as it presents two conserved HxKxxxxD motifs. However, 
in the Kp52.145 genome we did not identify any gene 
similar to the cognate immunity genes - a hallmark of the 
genomic islands described by Russel et al. Moreover, we 
did not observe such an antibacterial effect of Kp52.145 or 
its PLD1 mutant strain upon competition with E. coli [see 
Additional file 1]. These results showed that pldl is impli- 
cated in virulence without being an anti-bacterial factor 
and is, so far, unique. 

Conclusions 

This study presents a comparative analysis of the complete 
genome sequence of the high virulence reference strain 
Kp52.145, a derivative of the K2 reference strain B5055. It 



revealed five genomic regions possibly involved in bacterial 
virulence. One gene, pldl, was shown to be involved in 
virulence in a mouse model of pneumonia and revealed a 
novel implication of lipid metabolism in K pneumoniae 
pathogenesis. Future analysis of additional putative viru- 
lence factors such as Sell lipoproteins, VgrG, Hep, Bel, 
cOMP and Sefir-domain containing protein are required 
for a comprehensive understanding of K. pneumoniae core 
virulence genes. 

Methods 

Selection of isolates for genome sequencing/bacterial 
strains 

Strain Kp25.145 (a derivative of B5055, the reference 
strain of serotype K2) is a laboratory strain used to study 
K. pneumoniae pathogenesis [11,21] and was chosen as the 
focus of this work. SB2390 (curl 5505, isolated in Curacao, 
2002, urinary tract infection; belongs to ST14) and SB3193 
(IPEUC-744, isolated from a metritis case in a mare, 1981 
in France; belonging to ST82) [23] are non-virulent strains 
that were sequenced to allow the comparison between viru- 
lent and non-virulent strains. 

The pldl transposon mutant was isolated by Anna 
Tomas and Jose Bengoecheoa during a screen of a K. 
pneumoniae mutant library made by Tn5 transposon in- 
sertion (manuscript in preparation). In this mutant, the 
transposon was inserted at position 1,625 of pldl gene. 

The complementation of the mutant strain was achieved 
through bacterial transformation using electrocompe- 
tent cells and a plasmid carrying pldl gene, pldl 
gene was amplified by PCR and cloned at the multiple 
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(See figure on previous page.) 

Figure 5 PLD1 involvement in lipids metabolism. A) TLC lipid profiles of E. coll strain SD9 and SD9 + pPLD. Red circle indicates phosphatidyl 
glycerol. Blue circle represents the lipid present specifically upon pldl expression. B) Mass spectrometry profiles of £ coll strain SD9 and SD9 + pPLD 
lipids. The red arrow points to m/z 789.4 exclusively found in wild-type strain. TLC, thin-layer chromatography. 



cloning site of pucl8 plasmid. Cloning was confirmed by 
DNA sequencing. 

Genome sequencing and assembly 

K. pneumoniae strains were sequenced using a combin- 
ation of 454 and Illumina reads. Single and paired-end 
454 reads with an average of 400 nucleotides were as- 
sembled into contigs and scaffolds by Newbler. Illumina 
reads of about 76 or 36 nucleotides were aligned to scaf- 
folds in order to confirm and correct possible homo- 
polymer errors in the 454 reads. Coverage was as follows: 
Kp52.145 genome: 170 X using GAIIX (76 nt) + 13.8 X 
using MP titanium + 18 X using SR titanium; SB2390 
genome: 81 X using GAIIX (36 nt) + 6.4 X using MP 
titanium + 22 X using SR titanium; SB3193 genome: 
209 X using GAIIX (76 nt) + 5.4 X using MP titan- 
ium + 20 X using SR titanium. 

Following the primary assembly of the genomes, an 
in silico finishing approach, based on the methods de- 
scribed by Pop et al. [84], was performed in order to 
identify small and identical repeats on the genomes. In 
such cases and without high quality bases discrepancies, 
the contigs containing the small repeats were manually 
duplicated and added to the assembly. 

Scaffolds were aligned to experimentally determined 
Bglll optical maps generated by OpGen company from 
purified chromosomal DNA, using MapSolver version 3.1 
software. Such alignments were used to check for the qual- 
ity of the assemblies. Additionally, specific pairs of primers 
were designed in order to close all remaining gaps. PCR 
products were purified in NucleoFast 96 plates (Macherey 
Nagel, Duren, Germany) and aliquots were used for se- 
quencing reactions with the BigDye Terminator Cycle Se- 
quencing Ready reaction Kit (Applied Biosystems, Foster 
City, CA, USA) on a ABI Prism 3730XL DNA Analyzer 
(Applied Biosystems). The resulting sequences were added 
to the previous assembly using Phred/Phrap/Consed. 

The sequences of K pneumoniae genomes have been de- 
posited to the European Nucleotide Archive and are access- 
ible under the accession numbers: FO834904, FO834905 
and FO834906 (strain Kp52.145), CCBO00000000 (strain 
SB2390) and CCCQ000000000 (strain SB3193). 

Functional annotation of genomes 

In order to gain functional insights about the genome 
sequences, protein-coding genes were predicted and 
annotated using the CAAT-box genome browser [85], 
using a combination of GeneMark predictions and Blastx 



results against the Uniprot database. All the putative open 
reading frames (ORF) longer than 120 nucleotides pre- 
senting similarity to sequences of the Uniprot database or 
positive GeneMark result were considered for further ana- 
lysis. The final set of CDSs underwent a manual annota- 
tion process based on description of similarity. Pfam and 
COG database searches, as well as SignalP, TMHMM and 
PredTMBB predictions were performed to improve the 
degree of annotation confidence, if necessary. CDSs were 
described as 'highly similar to', similar to' or weakly simi- 
lar to' if they presented more than 70%, between 50% and 
69% or less than 50% similarity to the protein hit se- 
quence. Additionally, information on partial homology 
was included. The start codon for each CDS was automat- 
ically chosen and manually validated, based on a combin- 
ation of GeneMark results and Blast alignments. RAST 
was used to classify proteins in functional categories. 
Structural RNAs were searched using tRNAscan. 

The common genome of the five strains was determined 
using the BlastClust algorithm [86] using minimum pa- 
rameters of 90% identity and 80% length coverage for pro- 
teins to be included in the same cluster. 

RT-PCR 

Lungs of control or Kp52.145-infected mice were homog- 
enized in cold TRI reagent (Sigma, Gillingham, Dorset, 
UK) using a Precellys lysing kit (Precellys, Saint Quentin 
en Yvelines, France). Total lung and bacterial mRNAs 
were extracted according to the manufacturers instruc- 
tions. RNA (2 \ig) was reversed transcribed in cDNA using 
Superscript II enzyme (Invitrogen, Foster City, CA, USA). 
Aliquots were used in PCR reactions using specific 
primers [see Additional file 1]. 

Animal experiments 

BALB/cJ mice were purchased from Janvier (Le Genest 
St. Isle, France). Mice were housed under standard con- 
ditions of feeding, light and temperature with access to 
food and water ad libidium. Experiments were per- 
formed according to the national and Institut Pasteur 
guidelines for laboratory animal experiments. Protocols 
were approved by the Institut Pasteur animal care and 
use committee (protocol 05-59) and the Direction des 
Services Veterinaire de Paris (permit 75-713 to RT). 
Six- to eight-week-old mice were anesthetized with ace- 
promazine (Calmivet, 1.5 mg/kg, Vetoquinol, Lure, 
France), and ketamine (Imalgene, 31.25 mg/kg, Merial, 
Lyon, France), and then infected intranasally with 20 (il 
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bacterial suspension. The virulence of K. pneumoniae 
strains was tested on six-week-old BALB/c mice, as pre- 
viously described [23]. Seven mice per test condition 
were infected with 10 8 bacteria. Mice were followed 
every day for one week. Experiments were performed at 
least twice. 

Thin-layer chromatography and mass spectrometry of 
lipids 

Bacterial lipids were extracted by the method of Bligh 
and Dyer [87]. Briefly, bacterial stationary-phase cultures 
were concentrated 10 times and mixed with chloroform: 
methanol. After centrifugation at 1,000 rpm for five mi- 
nutes, the organic phase was recovered. Lipid profiles 
were analyzed by two-dimensional TLC using TLC Silica 
gel 60 F254 plates (Merck, Whitehouse Station, NJ, 
USA), as the stationary phase and a chloroform 9:1 
methanol mixture as the mobile phase in both dimen- 
sions. Staining was performed by iodine vapor. TLC 
calibrated images were aquired in ImageScanner using 
LabScan v5.0 software. The relative intensity of each 
spot was calculated in ImageMaster two-dimensional 
Platinum v7.0 software. Alternatively, lipid extracts were 
analyzed by MS and MS/MS in an ESI-Q-Tof Micro 
(Waters), in positive ion mode. Resolution was typically 
lower than 10 ppm. 

Anti-bacterial competition assays 

Competition assays were performed as previously de- 
scribed [88]. Briefly, K. pneumoniae Kp52.145 or its pldl 
mutant cells grown overnight on an agar plate were re- 
suspended in LB, normalized to OD 60 o of 0.5 and mixed 
at a ratio of 5:1 with a spontaneous nalidixic acid (nal) 
resistant mutant strain of E. coli MG1655. The mixture 
was incubated for four hours on a prewarmed agar plate. 
Recovered cells were plated out on antibiotic selective 
media and viable cells were reported as the total number 
recovered per co-culture spot. Serratia marcescens DB10 
was used as a positive control. 

Additional file 



Additional file 1:1) Distribution of K. pneumoniae Kp52.145 genes, 
according to RAST categories. Each functional category is represented 
in a different color. The total number of genes per category is shown. 
2) pld PCR assay. A collection of 42 virulent and non virulent clones was 
screened by PCR for pld gene. Strains containing pld gene are indicated 
by (+) and the absence of pld gene is (-). 3) TLC lipid profiles of 
K. pneumoniae Kp52.145 wild-type (left panel) and pld mutant strains 
(right panel). Black circles indicate differentially expressed lipids. 
4) Bacterial competition assay. Anti-bacterial activity was measured 
as the number of E. coli cells recovered after the co-culture with 
K. pneumoniae Kp52.145 wild-type and pld mutant strains. 5. marcesens was 
used as a positive control strain. 5) List of primers used for RT-PCR analysis. 
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